A method of constructing syllable level Tibetan text classification corpus
نویسندگان
چکیده
Corpus serves as an indispensable ingredient for statistical NLP research and real-world applications, therefore corpus construction method has a direct impact on various downstream tasks. This paper proposes to construct Tibetan text classification based syllable-level processing technique which we refer TC_TCCNL. Empirical evidence indicates that the algorithm is able produce promising performance, may lay starting point in future.
منابع مشابه
Corpus based coreference resolution for Farsi text
"Coreference resolution" or "finding all expressions that refer to the same entity" in a text, is one of the important requirements in natural language processing. Two words are coreference when both refer to a single entity in the text or the real world. So the main task of coreference resolution systems is to identify terms that refer to a unique entity. A coreference resolution tool could be...
متن کاملTibetan Syllable-Based Functional Chunk Boundary Identification
Tibetan syntactic functional chunk parsing is aimed at identifying syntactic constituents of Tibetan sentences. In this paper, based on the Tibetan syntactic functional chunk description system, we propose a method which puts syllables in groups instead of word segmentation and tagging and use the Conditional Random Fields (CRFs) to identify the functional chunk boundary of a sentence. Accordin...
متن کاملBuilding Large Scale Text Corpus for Tibetan Natural Language Processing by Extracting Text from Web Pages
In this paper, we propose an approach to build a large scale text corpus for Tibetan natural language processing. We find the distribution of Tibetan web pages on the internet with a crawler which can identify whether or not a web page contains Tibetan text. Three biggest web sites are selected, and topic pages are selected with a rule based method by checking the url. The layout structures of ...
متن کاملAn effective procedure for constructing a hierarchical text classification system
In text categorization tasks, classification on some class hierarchies has better results than in cases without the hierarchy. Currently, because a large number of documents are divided into several subgroups in a hierarchy, we can appropriately use a hierarchical classification method. However, we have no systematic method to build a hierarchical classification system that performs well with l...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: MATEC web of conferences
سال: 2021
ISSN: ['2261-236X', '2274-7214']
DOI: https://doi.org/10.1051/matecconf/202133606013